Homework 4

Instructions

In this assignment, you’ll turn to the Hospital Cost Report Information System data. These data are described in detail in the HCRIS GitHub Repo.

Once you have the data downloaded and the code running, answer the following questions:

The due date for initial submission is 4/7, the revision due date is 4/9, and the final due date is Friday, 4/11.

Summarize the data

  1. How many hospitals filed more than one report in the same year? Show your answer as a line graph of the number of hospitals over time.

  2. After removing/combining multiple reports, how many unique hospital IDs (Medicare provider numbers) exist in the data?

  3. What is the distribution of total charges (tot_charges in the data) in each year? Show your results with a “violin” plot, with charges on the y-axis and years on the x-axis. For a nice tutorial on violin plots, look at Violin Plots with ggplot2.

  4. What is the distribution of estimated prices in each year? Again present your results with a violin plot, and recall our formula for estimating prices from class. Be sure to do something about outliers and/or negative prices in the data.

discount_factor = 1-tot_discounts/tot_charges
price_num = (ip_charges + icu_charges + ancillary_charges)*discount_factor - tot_mcare_payment
price_denom = tot_discharges - mcare_discharges
price = price_num/price_denom

Estimate ATEs

For the rest of the assignment, we’ll use a regression discontinuity design to estimate the average treatment effect from receiving a marginally higher rating. We’ll focus only on 2010.

  1. Calculate the running variable underlying the star rating. Provide a table showing the number of plans that are rounded up into a 3-star, 3.5-star, 4-star, 4.5-star, and 5-star rating.

  2. Using the RD estimator with a bandwidth of 0.125, provide an estimate of the effect of receiving a 3-star versus a 2.5 star rating on enrollments. Repeat the exercise to estimate the effects at 3.5 stars, and summarize your results in a table.

  3. Repeat your results for bandwidhts of 0.1, 0.12, 0.13, 0.14, and 0.15 (again for 3 and 3.5 stars). Show all of the results in a graph. How sensitive are your findings to the choice of bandwidth?

  4. Examine (graphically) whether contracts appear to manipulate the running variable. In other words, look at the distribution of the running variable before and after the relevent threshold values. What do you find?

  5. Similar to question 4, examine whether plans just above the threshold values have different characteristics than contracts just below the threshold values. Use HMO and Part D status as your plan characteristics.

  6. Summarize your findings from 5-9. What is the effect of increasing a star rating on enrollments? Briefly explain your results.